Laboratorium 3

In [31]:
#tablice wielowymiarowe w Julii
Asmall=[[1.0 0.0 10]; [0.0 1.0 10]]
Bsmall=Asmall
Asmall
C = zeros(Float64, size(Asmall,1), size(Asmall,2))
Out[31]:
2×3 Array{Float64,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0
In [26]:
size(Asmall)
Out[26]:
(2, 3)
In [14]:
# mnożenie macierzy - wersja naiwna, naiwna przez sprosob dostepu do pamieci
# najlepiej zeby dane byly blisko siebie w pamieci, przez sposob dzialania pamieci cache, ktora odczytuje dane 
# blokami zwanymi liniami cache. Odczytywanie obok siebie jest duzo szybsze.
# Pytniem jest jak jest tablica przechowywana w pamieci, zazwyczaj wierszami (C) ale w julli sa przechowywane
# kolumnami 
function naive_multiplication(A,B)
C=zeros(Float64,size(A,1),size(B,2))
  for i=1:size(A,1)
    for j=1:size(B,2)
        for k=1:size(A,2)
            C[i,j]=C[i,j]+A[i,k]*B[k,j]
        end
    end
end
C
end
Out[14]:
naive_multiplication (generic function with 1 method)
In [4]:
#kompilacja
naive_multiplication(Asmall,Bsmall)
Out[4]:
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
In [5]:
#kompilacja funkcji BLASowej do mnożenia macierzy
#https://docs.julialang.org/en/stable/stdlib/linalg/#BLAS-Functions-1
#to tez jest mnozenie macierzy ale zoptymalizowanymi funkcjami BLAS
Asmall*Bsmall
Out[5]:
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
In [6]:
A=rand(1000,1000); #tworzenie macierzy 1000 x 1000 z losowymi wartosciami 
B=rand(1000,1000);
In [7]:
# Należy pamiętać o "column-major" dostępie do tablic - 
# pierwszy indeks zmienia się szybciej
# tak jak Matlab, R, Fortran 
# inaczej niz C, Python
A1 = [[1 2]; [3 4]]
vec(A1)
Out[7]:
4-element Array{Int64,1}:
 1
 3
 2
 4
In [15]:
# poprawiona funkcja korzytająca z powyższego oraz z faktu, że
#można zmieniać kolejność operacji dodawania (a co za tym idzie kolejnosc petli).
# jest lepsza przez zamienienie kolejnosci petli i czesciej odczytuje elementy bedace blizej siebie 
function better_multiplication( A,B )
C=zeros(Float64,size(A,1),size(B,2))
  for j=1:size(B,2)
    for k=1:size(A,2)
        for i=1:size(A,1)
            C[i,j]=C[i,j]+A[i,k]*B[k,j]
        end
    end
end
C
end
Out[15]:
better_multiplication (generic function with 1 method)
In [9]:
better_multiplication(Asmall, Bsmall)
Out[9]:
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
In [10]:
@elapsed naive_multiplication(A,B) #mierzenie czasu
Out[10]:
3.69591414
In [11]:
@elapsed better_multiplication(A,B)
Out[11]:
1.600793835
In [12]:
@elapsed A*B #blas
Out[12]:
0.016316159
In [59]:
# aproksymacja sredniokwadratowa wielomianem - tutaj przyklad dla wielomianu 3 stopnia
# pakiet Polynomials jest mozliwy do instalacji pod Juliabox
# https://github.com/JuliaMath/Polynomials.jl


using Polynomials
xs = 0:3; ys = [1,3,4,5]
fit1=polyfit(xs, ys,3)

# po prostu za xs podstawic to co mam za wielkosc macierzy a za ys podstawiac po kolei te zmienne dla których
# chce wyliczyc wielomian
Out[59]:
1.0 + 2.8333333333333335∙x - 0.9999999999999999∙x^2 + 0.16666666666666663∙x^3
In [14]:
# obliczanie wartosci wielomianu 
fit1(1)
Out[14]:
836.4071935534389
In [15]:
# obliczanie wartosci wielomianu (drugi sposób)
polyval(fit1, 1)
Out[15]:
836.4071935534389
In [60]:
using Plots

# geste punkty do wyliczenia wartosci wielomianu aproksymujacego:
xd=0:0.1:10
# wykres wartosci wielomianu dla gestych punktow:
plot(xd,polyval(fit1, xd))

# ! -dodanie do tego samego wykresu punktów wg ktorych aproksymowalismy
scatter!(xs,ys)
Out[60]:
0.0 2.5 5.0 7.5 10.0 0 20 40 60 80 y1 y2

Zadania

1.Uruchomić

  • naive_multiplication(A,B),
  • better_multiplication(A,B)
  • mnożenie BLAS w Julii (A*B)

dla coraz większych macierzy i zmierzyć czasy. Narysować wykres zależyności czasu od rozmiaru macierzy wraz z słupkami błędów, tak jak na poprzednim laboratorium. Wszystkie trzy metody powinny być na jednym wykresie.

2.Napisać w języku C:

- naiwną metodę mnożenia macierzy (wersja 1) 
- ulepszoną za pomocą zamiany pętli metodę mnożenia macierzy (wersja 2), pamiętając, że w C macierz przechowywana jest wierszami (row major order tzn A11,A12, ..., A1m, A21, A22,...,A2m, ..Anm), inaczej niż w Julii ! 
- skorzystać z  możliwości BLAS dostępnego w GSL(wersja 3). 

Należy porównywać działanie tych trzech algorytmow bez włączonej opcji optymalizacji kompilatora. Przedstawić wyniki na jednym wykresie tak jak w p.1.(osobno niż p.1). (Dla chętnych) sprawdzić, co się dzieje, jak włączymy optymalizację kompilatora i dodać do wykresu.

3.Użyć funkcji polyfit z pakietu Polynomials do znalezienia odpowiednich wielomianow, ktore najlepiej pasują do zależności czasowych kazdego z algorytmow. Stopień wielomianu powinien zgadzać się z teoretyczną złożonoscią. Dodać wykresy uzyskanych wielomianow do wczesniejszych wykresów.

In [16]:
columns_and_rows = Int64[]
naive_time = Float64[]
better_time = Float64[]
blas_time = Float64[]

nb_of_tests = 10
i = 50
while(i <= 1000)
    for k=0:nb_of_tests
        A = rand(i,i)
        B = rand(i,i)
        push!(columns_and_rows, i)
        push!(naive_time,@elapsed naive_multiplication(A,B))
        push!(better_time,@elapsed better_multiplication(A,B))
        push!(blas_time,@elapsed A * B)
    end
    i += 50
end
columns_and_rows
Out[16]:
220-element Array{Int64,1}:
   50
   50
   50
   50
   50
   50
   50
   50
   50
   50
   50
  100
  100
    ⋮
  950
 1000
 1000
 1000
 1000
 1000
 1000
 1000
 1000
 1000
 1000
 1000
In [18]:
using DataFrames

df = DataFrame()
df[:columns_and_rows]= columns_and_rows
df[:naive_time] = naive_time
df[:better_time] = better_time
df[:blas_time] = blas_time
Out[18]:
220-element Array{Float64,1}:
 0.445678423
 2.42e-5    
 1.8701e-5  
 2.19e-5    
 2.24e-5    
 1.99e-5    
 2.24e-5    
 2.21e-5    
 4.9201e-5  
 2.08e-5    
 2.22e-5    
 0.000635004
 8.0501e-5  
 ⋮          
 0.011140474
 0.017180114
 0.015388102
 0.013482489
 0.021169041
 0.013324388
 0.013693191
 0.016768711
 0.013379489
 0.014136793
 0.01671251 
 0.01357489 
In [19]:
using Statistics
df2 = DataFrame(by(df, [:columns_and_rows],
    :naive_time => mean,
    :naive_time => std,
    :better_time => mean,
    :better_time => std,
    :blas_time => mean,   
    :blas_time => std))
Out[19]:

20 rows × 7 columns

columns_and_rowsnaive_time_meannaive_time_stdbetter_time_meanbetter_time_stdblas_time_meanblas_time_std
Int64Float64Float64Float64Float64Float64Float64
1500.002580370.007136010.002336550.006965210.04053840.13437
21000.003339749.22051e-50.001577796.40788e-50.009018620.0294275
31500.03713510.03552130.01918120.02984980.01395550.0305566
42000.09284410.02169860.03928090.03527720.01428360.0308874
52500.1781350.005789420.02236720.0006587480.0005029946.55528e-5
63000.182990.02254360.04025490.001989670.0007932230.000113745
73500.2396350.01313530.06370230.001963390.001445690.00106744
84000.303820.009266580.09268780.003415620.001620190.00131963
94500.4089570.02093030.1351730.006499570.001891650.00048213
105000.5085450.01188910.1837530.004897860.002574290.00122537
115500.650910.01110290.2420950.006085520.005816480.0023734
126000.8181390.03808250.3159040.007635550.00490590.001418
136501.011970.02165120.4004140.007305410.005220240.00214967
147001.242730.04262290.5097180.01360650.006103150.00172784
157501.5170.04485170.6229550.01293830.008353360.00332808
168001.810980.03422950.7423720.01003730.01138480.000878231
178502.182620.04057810.9231560.01749810.009849640.00239953
189002.524770.06913411.078290.02711350.01025810.00175947
199502.950790.08348881.26610.02741160.01256850.00226778
2010003.42930.0760561.489920.02958360.01534630.00244791
In [20]:
using Plots

ydata = scatter(df2[:columns_and_rows],
    [df2[:naive_time_mean] df2[:better_time_mean] df2[:blas_time_mean]], 
    yerr = [df2[:naive_time_std] df2[:better_time_std] df2[:blas_time_std]],
    labels = ["Naive Time" "Better time" "BLAS time"],
    title = "Time of mulitplication in Julia",
    legend=:topleft,
    xlabel = "Size of matrix",
    ylabel = "Time [s]",)
Out[20]:
200 400 600 800 1000 0 1 2 3 Time of mulitplication in Julia Size of matrix Time [s] Naive Time Better time BLAS time
In [21]:
using CSV
input0="resultO0.csv"
mydata0=CSV.read(input0, delim=";")
input1="resultO1.csv"
mydata1=CSV.read(input1, delim=";")
input2="resultO2.csv"
mydata2=CSV.read(input2, delim=";")
input3="resultO3.csv"
mydata3=CSV.read(input3, delim=";")
Out[21]:

200 rows × 4 columns

columns_and_rowsnaive_timebetter_timeblas_time
Int64⍰Float64⍰Float64⍰Float64⍰
1509.8e-56.7e-59.7e-5
2509.4e-56.3e-50.000103
3509.3e-57.3e-59.5e-5
4500.0001046.2e-59.2e-5
5509.3e-56.3e-50.000129
6509.3e-56.2e-59.2e-5
7509.3e-56.1e-59.1e-5
8509.3e-56.4e-50.000112
9509.3e-56.2e-50.000124
10500.0001547.6e-50.000114
111000.0009440.000470.000702
121000.0008470.0006320.00102
131000.0012980.0006950.001074
141000.0009660.0004490.000664
151000.0008190.0004720.000645
161000.0008160.0004480.000669
171000.0008140.0004370.00071
181000.0008960.0004750.000754
191000.0008050.0005040.000652
201000.0008130.0004370.00067
211500.0028340.0015250.002231
221500.0029140.0015240.002104
231500.0028660.0014680.002066
241500.0027450.001430.002067
251500.002790.0014670.002049
261500.0027520.0014160.002069
271500.0027690.0014190.002067
281500.0027570.0014090.002065
291500.0027680.0014190.002056
301500.0027980.0014330.002046
In [23]:
using Statistics, DataFrames, Plots
data0 = DataFrame(by(mydata0, [:columns_and_rows],
    :naive_time => mean,
    :naive_time => std,
    :better_time => mean,
    :better_time => std,
    :blas_time => mean,   
    :blas_time => std))
data1 = DataFrame(by(mydata1, [:columns_and_rows],
    :naive_time => mean,
    :naive_time => std,
    :better_time => mean,
    :better_time => std,
    :blas_time => mean,   
    :blas_time => std))
data2 = DataFrame(by(mydata2, [:columns_and_rows],
    :naive_time => mean,
    :naive_time => std,
    :better_time => mean,
    :better_time => std,
    :blas_time => mean,   
    :blas_time => std))
data3 = DataFrame(by(mydata3, [:columns_and_rows],
    :naive_time => mean,
    :naive_time => std,
    :better_time => mean,
    :better_time => std,
    :blas_time => mean,   
    :blas_time => std))
Out[23]:

20 rows × 7 columns

columns_and_rowsnaive_time_meannaive_time_stdbetter_time_meanbetter_time_stdblas_time_meanblas_time_std
Int64⍰Float64Float64Float64Float64Float64Float64
1500.00010081.9031e-56.53e-55.16505e-60.00010491.40194e-5
21000.00090180.0001509580.00050198.88075e-50.0007560.000157205
31500.00279935.55479e-50.0014514.36552e-50.0020825.46809e-5
42000.00702926.93282e-50.00337281.62535e-50.00496692.67725e-5
52500.01387678.9614e-50.00653726.45838e-50.00959245.47158e-5
63000.02426170.00013520.01116698.02641e-50.01646379.61157e-5
73500.04340190.0005198110.01762530.0002023120.02610960.00010107
84000.0592220.0002718670.02582320.0001768430.03892190.000295755
94500.08971980.002984830.03654320.0003744390.05553860.00123746
105000.1326880.0001738860.0501580.0001040830.07499960.000275655
115500.1890490.02187150.06636370.0005428220.1001580.000844697
126000.2390290.02075610.0860830.001180620.1299530.00171527
136500.2962710.003972980.1092320.0007353910.1649190.00266135
147000.4070730.05604030.1414040.005800820.20560.00280436
157500.4639160.07125140.1733450.007942570.2505330.00205358
168000.5628290.004346850.2078490.0008723080.3077810.00391193
178500.7387290.03058030.2660750.005050220.3868430.00665715
189001.540440.06140940.3515710.003531880.5144940.0102056
199501.449310.113670.4410020.003832810.6216480.00408937
2010003.430110.09967680.5139760.001121160.7357520.00408139
In [24]:
plotO0 = scatter(data0[:columns_and_rows],
    [data0[:naive_time_mean] data0[:better_time_mean] data0[:blas_time_mean]], 
    yerr = [data0[:naive_time_std] data0[:better_time_std] data0[:blas_time_std]],
    labels = ["Naive Time" "Better time" "BLAS time"],
    title = "Time of mulitplication in C, with O0 optim",
    legend=:topleft,
    xlabel = "Size of matrix",
    ylabel = "Time [s]")
plotO1 = scatter(data1[:columns_and_rows],
    [data1[:naive_time_mean] data1[:better_time_mean] data1[:blas_time_mean]], 
    yerr = [data1[:naive_time_std] data1[:better_time_std] data1[:blas_time_std]],
    labels = ["Naive Time" "Better time" "BLAS time"],
    title = "Time of mulitplication in C, with O1 optim",
    legend=:topleft,
    xlabel = "Size of matrix",
    ylabel = "Time [s]")
plotO2 = scatter(data2[:columns_and_rows],
    [data2[:naive_time_mean] data2[:better_time_mean] data2[:blas_time_mean]], 
    yerr = [data2[:naive_time_std] data2[:better_time_std] data2[:blas_time_std]],
    labels = ["Naive Time" "Better time" "BLAS time"],
    title = "Time of mulitplication in C, with O2 optim",
    legend=:topleft,
    xlabel = "Size of matrix",
    ylabel = "Time [s]")

plotO3 = scatter(data3[:columns_and_rows],
    [data3[:naive_time_mean] data3[:better_time_mean] data3[:blas_time_mean]], 
    yerr = [data3[:naive_time_std] data0[:better_time_std] data3[:blas_time_std]],
    labels = ["Naive Time" "Better time" "BLAS time"],
    title = "Time of mulitplication in C, with O3 optim",
    legend=:topleft,
    xlabel = "Size of matrix",
    ylabel = "Time [s]")


naive_compare = scatter([data0[:columns_and_rows] data1[:columns_and_rows] data2[:columns_and_rows]],
    [data0[:naive_time_mean] 
     data1[:naive_time_mean] 
     data2[:naive_time_mean] ],
    yerr = 
    [data0[:naive_time_std] 
     data1[:naive_time_std] 
     data2[:naive_time_std] ],
    labels = ["Naive Time O0" "Naive Time O1" "Naive Time O2"],
    title = "naive in optimalization dependency",
    legend=:topleft,
    xlabel = "Size of matrix",
    ylabel = "Time [s]",)


display(plotO0)
display(plotO1)
display(plotO2)
display(plotO3)

display(naive_compare)
200 400 600 800 1000 0 5 10 15 Time of mulitplication in C, with O0 optim Size of matrix Time [s] Naive Time Better time BLAS time
200 400 600 800 1000 0 1 2 3 4 5 Time of mulitplication in C, with O1 optim Size of matrix Time [s] Naive Time Better time BLAS time
200 400 600 800 1000 0 1 2 3 Time of mulitplication in C, with O2 optim Size of matrix Time [s] Naive Time Better time BLAS time
200 400 600 800 1000 0 1 2 3 Time of mulitplication in C, with O3 optim Size of matrix Time [s] Naive Time Better time BLAS time
200 400 600 800 1000 0 5 10 15 naive in optimalization dependency Size of matrix Time [s] Naive Time O0 Naive Time O1 Naive Time O2
In [75]:
using Polynomials, Plots

fit_data0_naive=polyfit(data0[:columns_and_rows],data0[:naive_time_mean],3)
xd=0:0.01:1000
plot(xd,polyval(fit_data0_naive, xd))
scatter!(data0[:columns_and_rows],data0[:naive_time_mean])
scatter!(labels = ["Naive Time O0" "Polynomial approximation"],
legend=:topleft,xlabel = "Size of matrix", ylabel = "Time [s]", title = "Naive in C")
#print(fit_data0_naive)
Out[75]:
0 250 500 750 1000 0 5 10 15 Naive in C Size of matrix Time [s] y1 y2
In [81]:
fit_data0_better=polyfit(data0[:columns_and_rows],data0[:better_time_mean],3)
xd=0:0.01:1000
plot(xd,polyval(fit_data0_better, xd))
scatter!(data0[:columns_and_rows],data0[:naive_time_mean])
scatter!(labels = ["Naive Time O0" "Polynomial approximation"],
legend=:topleft,xlabel = "Size of matrix", ylabel = "Time [s]", title = "Better in C")
#print(fit_data0_better)
Out[81]:
0 250 500 750 1000 0 5 10 15 Better in C Size of matrix Time [s] y1 y2
In [76]:
fit_data0_blas=polyfit(data0[:columns_and_rows],data0[:blas_time_mean],3)
xd=0:0.01:1000
plot(xd,polyval(fit_data0_blas, xd))
scatter!(data0[:columns_and_rows],data0[:naive_time_mean])
scatter!(labels = ["Naive Time O0" "Polynomial approximation"],
legend=:topleft,xlabel = "Size of matrix", ylabel = "Time [s]", title = "BLAS in C")
#print(fit_data0_blas)
Out[76]:
0 250 500 750 1000 0 5 10 15 BLAS in C Size of matrix Time [s] y1 y2
In [77]:
fit_julia_naive=polyfit(df2[:columns_and_rows],df2[:naive_time_mean],3)
xd=0:0.01:1000
plot(xd,polyval(fit_data0_naive, xd))
scatter!(data0[:columns_and_rows],data0[:naive_time_mean])
scatter!(labels = ["Naive Time O0" "Polynomial approximation"],
legend=:topleft,xlabel = "Size of matrix", ylabel = "Time [s]", title = "Naive in Julia")
#print(fit_julia_naive)
Out[77]:
0 250 500 750 1000 0 5 10 15 Naive in Julia Size of matrix Time [s] y1 y2
In [78]:
fit_julia_better=polyfit(df2[:columns_and_rows],df2[:better_time_mean],3)
xd=0:0.01:1000
plot(xd,polyval(fit_data0_better, xd))
scatter!(data0[:columns_and_rows],data0[:naive_time_mean])
scatter!(labels = ["Naive Time O0" "Polynomial approximation"],
legend=:topleft,xlabel = "Size of matrix", ylabel = "Time [s]", title = "Better in Julia")
#print(fit_julia_better)
Out[78]:
0 250 500 750 1000 0 5 10 15 Better in Julia Size of matrix Time [s] y1 y2
In [79]:
fit_julia_blas=polyfit(df2[:columns_and_rows],df2[:blas_time_mean],3)
xd=0:0.01:1000
plot(xd,polyval(fit_data0_blas, xd))
scatter!(data0[:columns_and_rows],data0[:naive_time_mean])
scatter!(labels = ["Naive Time O0" "Polynomial approximation"],
legend=:topleft,xlabel = "Size of matrix", ylabel = "Time [s]", title = "BLAS in Julia")
#print(fit_julia_blas)
Out[79]:
0 250 500 750 1000 0 5 10 15 BLAS in Julia Size of matrix Time [s] y1 y2